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ABSTRACT 

Language comprehension is an immensely complex 
process involving the dynamic interaction of diverse sources of 
knowledge. In order to model this process, tools are needed which 
allow detailed specification of the process components. In this 
paper, the essential characteristics of a model of reading 
comprehension are discussed in the context of the development of a 
computer model of the processes involved. Specific examples of text 
are analyzed to illustrate some of the complexities. It is argued 
that such a model would be valuable both in the construction of tests 
and instructional materials and in the systematic study of reading. 
Issues in implementing the computer model are also considered. 
(AA) 
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A. Overview 

Language comprehension is an immensely complex process 
involving the dynamic interaction of diverse sources of 
knowledge. In order to model this process we must have 
tools which allow detailed specification of the process 
components. Traditional approaches to the study of reading 
have met with only limited success. Our approach is based 
on the realization that a much richer variety of 
intellectual tools is required if we are to make significant 
progress in our understanding of the reading process. We 
propose to develop a language for describing a-spects of 
reading comprehension which will facilitate construction of 
tests and instructional materials, and make possible a more 
systematic study of reading. The validity and usefulness of 
this language will be explored via the implementation of a 
computer model of aspects of comprehension for a particular 
text . 



B. Essential Characteristics of A Model of Reading 
Comprehension 

Before discussing the uses and implications of a model 
of reading comprehension, we will discuss three 
characteristics of such models which we take to be 
essential. Briefly stated, such models should be 
multi -level , interactive , and hypothesis-based. Multi-level 
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teolie« that knowledge structures, which „e call schemata, 
=t several different levels are actively used in the reading 
process: traditionally-proposed levels include orthographic, 
Phonological, lexical, syntactic and semantic. clearly, 

higher-level knowledge sources such as inference rules 
(Rieger, 1975), social action theory (Bruce and ..h„iat, 

1974; Bruce, 1975a,. Schmidt, 1975, and expectations about 

story structure (Rumelhart. 1975, are crucial components of 

the skilled reading process. 

interactive reflects our conviction that these varied 
knowledge sources Interact in a heterarchical fashion; that 
is, although they may naturally form a knowledge hierarchy 
running from orthographic knowledge to expectations about 
story structure, communication is not limited to adjacent 
-mbers of the hierarchy. The scenario proposed by some 
psychologists (Gough, 1972; LaBerge and Samuels 1974), which 
involves a visual input progressing linearly through the 
various knowledge levels to arrive finally at a "meaning", 
is not considered plausible. Instead, we win consider 
models which allow each knowledge source to put in its 
"two-cents- worth" at various points in the progression to 
comprehension of the text (Rumelhart, in press,. 

The coordination o£ this multitude of contributions 
renulres a central structure which collects evidence for 
various interpretations of the text. He may generically 



BBN Report No. 3427 



Bolt Beranek and Nswman Inc. 



• call such a structure a hypothesis and our models 
hvpothesis-based models (Rubin, 1975). Two characteristics 
of hypotheses are important to mention here: (1) a 
hypothesis represents a possible interpretation which may 
later either be proven or disproven. (2) part of the 
structure of a hypothesis is the specification of those 
pieces of evidence which support or contradict it. 

Several existing reading theories share significant 
properties with the general form described here. Goodman 
(1973) describes receptive language processes in general as 
hypothesis-based, defining them as "cycles of sampling, 
predicting, testing and confirming." He recognizes three 
levels of cues which readers use: grapheinic, syntactic and 
semantic; these cue systems are used "simultaneously and 
interdependently." Productive reading is seen as reouiring 
strategies which facilitate the selection of the most useful 
cues . 

Smith (1973) also emphasizes the contribution of what 
he terms "nonvisual" information to reading. This nonvisual 
knowledge includes what people already know about reading, 
language and the world in general. He argues particularly 
that reading is not decoding to sound, but rather that 
semantic and other nonvisual processes intercede between 
visual processes and reading aloud. 
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Perfetti (1975) proposes at least three levels- of 
. sentence processing which obviously require corresponding 
levels Of knowledge. He also focuses n,ore explicitly on how 
the various component processes might interact , basing his 
overall conclusions on the fact that all the processes which 
occur during reading comprehension must share a "lin^ited 
capacity processor." 

Though our approach shares much with that of these and 
other investigators, there are also some differences in 
emphasis. We propose to be more explicit in the designation 
Of different levels of knowledge sources, particularly in 
the area Goodman terms "-semantic." We recognize at least the 
following types of knowledge: word semantics; knowledge of 
logical inference rules; discourse semantics; knowledge of 
social actions, their precondition., and outcomes; story 
schemata; understanding of various reading tasks; and 
strateaic knowled^ about how to use each of the above 
knowledge sources. m addition, we consider the explicit 
definition of the interaction between these knowledge 
components of the utmost importance and propose to 
investigate the possibility that some unskilled reading may 
be the result of not knowing how to use and interleave 
knowledge, rather than of a lack of knowledge itself! 

A final emphasis of our theory-building will be to take 
the notion of hypothesis seriously, in particular the 
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notions thot a hypothesis may be wronq and that at various 
points during the reading process it may be in a state of 
limbo, only partially specified, needing more evidence, or 
perhaps even uncertain because of conflicting evidence. 
Some researchers (e.g. Fodor, Bever, and Garrett, 1974) 
have tried to investigate the temporal course of reading 
comprehension with experiments such as phoneme monitoring; 
we intend to consider as well the possibility that as a 
consequence of some of the intermediate stages, the reader 
must "back up" and re-hypothesize about the meaning of a 
text. Goodman (1973) has noted that "proficient 
readers... are able to recover when they produce miscues 
which change the meaning in unacceptable ways." We will 
attenipt to isolate these circumstances and define the 
methods skilled readers use to debug their hypotheses. 

An important aspect of the above-described models which 
has practical implications for reading problems is the 
emphasis on structure - buUdi^ . These structures or 
schemata are important for both the final representation of 
the meaning of the text and the intermediate hypotheses 
which are so crucial to attaining the final goal. Three 
classes of knowledge are necessary for building such 
structures. First of all, a reader must have sufficient 
information about the types of schemata which are possible 
at each level, how to recognize them and what implications 
they have for further processing. Second, there is a whole 
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body Of ,„owlea,e which „e .i,ht ter™ strategic, it consists 
Of information on how to use the structural .novledoe. what 
priorities to use in evaluating hypotheses a„a what form the 
txnal "understood.' structure should take. Third, there is 
knowledge about the purpose of reading the particular test. 

Which can dramatically alter h^t-h 

c«xxy alter both the structural and 

strategic knowledge used. 
C. Why a Computer Model ? 

The most important motivation for turning to the 
computer is the need for an appropriate language for 
expressing the theoretical constructs underlying the 

structure and use of schema theory =nd its int„r,.^ • 

^^Ly _na Its interactions with 

lower-level knowledge sources. The comprehensiveness and 
utxuty Of such a theory rests in .art on how clearly one 
can specify these interactions so heterogeneous knowledge 
sources cooperate to produce "comprehension." How does one 
-ally aefine and represent the strategic knowledge 
controlling these interactions and verify that it has the 
desired effect? 

«ecan talk loosely about these control structure 
issues in terms of passing messages back and forth between 
the various process levels as a way of controllino the 
iiUeractlon between high level hypothesis based processes 

and bottora-up data driver processes «„ • , 

yiocesses. An implemented 

9 
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computer model, however, ,ives us „uch greater power for 
precise expression. It would provide us with a way of 
examining the oonseauences of modifying or deleting certai:. 
strategic rules. By hand <„ith paper and pencil) we could 
never keep track of the combinatorial interactions of all 
the processes involved. Processes can interact in subtle 
and unpredictable ways. A computer facility, however, 
provides an exhaustive system for carefully studying these 
interactions exhaustively. 

we want to emphasize the importance of the influence of 
computational concepts and of actually implenenting portions 
of our proposed model- on the computer. To reiterate, a 
computer model is valuable for several reasons: ' . 

" ll^t^^^. p^LclsSrL'd s^?ra?L'^ief?-- °^ 
ILpVeTdT^f' thfMgnUiv 

exnlnrin.. ^ ^.-Z- cognitive orocesses we are 
f^fl^sCj ^f1Sjs°g:^l!^""^ P^P" -thodi 

ttrTn!^'.-"''^- °biective test of a theory 

p'e3ud°L"l':nd" iop^s? ^"""^"-^ 

possi^r proc'e°s^!^g"pa\hs °^ the 

r^t-ho,- fcJtuuessing paths m a given situation 

d?s«vers'"" ""-h introspecuSn 

ou\Stitiersuc'h':rso:cT.nr;"^"'""V ™-surable 
number and ?ype TttUTcL'nTe.ir"™ 

The BBN speech understanding system illustrates both 

the conseouences of attempting to implement a complex 

10 
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latiguaqe-processing model and some of the techniques 
developed to deal with the problems encountered. One of the 
concepts developed in this context to deal with the 
interaction of low-level acoustic processes and higher-level 
syntactic and semantic ones has been that of "verification." 
The acoustic recognition procedures have a threshold by 
Which they eliminate marginally-inatched words in their 
preliminary processing of the input. If later the syntactic 
or semantic component proposes a word which the acoustic 
process did not discover in its initial scan, that word can 
be ejcplicitly matched with less stringent requirements. We 
intend to take advantage of the insights already provided by 
work on the speech understanding system in our work on 
reading; such insights are indicative of the advantages of 
building and using computer models. 

D. potential Uses of a Language for Describing Reading 
Com £ £e hen s ion 

A process-oriented language for describing reading 
comprehension has many potential uses in teaching and 
studying reading. Although it is not our goal to produce 
practical tools, we plan to test our model's feasibility by 
applying it to two real tasks: analyzing reading tests and 
scoring recall protocols. 

The assess.-nent of reading comprehension would be 

11 



B8N Report No. 3427 



Bolt Beranek and Newman Inc. 



qreatly facilitated by a reading test which could determine 
whether or not a particular inferential skill had been 
mastered by a reader. By representing in the computer all 
of the relevant inference rules and world knowledge 
applicable to a small piece of a Particular text, we could 
examine in detail all the possible applications and 
interactions of the rules which could lead to answerina test 
questions. Each step of each solution path could be 
recorded. By examining the resulting solution space (i.e. 
the set Of all solution paths) we could determine if all the 
answer paths used a particular mediating inference skill 
(such as rules about speech acts). These rules of inference 
will not be restricted to "logical" rules but will include 
such additional reasoning procedures which we know people 
use. To some extent this will be achieved by building on 
existing work on inference (e.g. Collins, Warnock, Aiello 
and Miller, 1975). We certainly cannot anticipate all 
conceivable ways a person might think in answering a given 
Question. However, we claim such a computer model could be 
extremely useful for tracing out all the inferences - both 
valid and hasty - that could follow from the knowledge base 
of the model . 

This opens the way to more objective scaling on a set 
of dimensions not normally used in test design, for example, 
measuring the amount of world knowledge required to answer a 
test auestion. One might attempt to r^easure this in terms 

12 
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Of the number of sch.^ata i„^ok.d for . solution p.th and 
their degree of embeddinc,. ,,e could investigate the depth 
of inferenoing required in t:et„. of the shortest path, and 
this in turn could be used aS the basis for a measure of the 
inferencing efficiency of particular Solutions. „e could 
measure some of the short ter. ™e„ory demands in terms of 
the amount of backtracWng required or potentially required, 
in addition „e might explore the possibility of devising 
more sophisticated measures of readability. Traditionally 
these have been based o„ more or le.s crude measures of 
sentence complexity, together «ith word frequency counts 
(e.g.. Dale and Chall, 1948; Borniuth, 196,,. ^ 
computer-based test analysi, op,„, the aoor to much more 
varied and meaningful meaSUceS- 

A computer model also has great promise for providing a 
partial solution to a long-standibg problem in research in 
...psychology and education, namely the problem of how to 
provide objective and reliable scores for free recall 
protocols. currently, ,„ ma„y piyotal recall experiments, 
we must solely on the ,.p,rimenter-s good judgment in 

naming and classifying differences between the story and its 
recalled form. The partial solution „e propose is to 
utilize a symbiotic person/machine system. The role of the 
computer model will be to specify , ^et of transformations 
between the original te« and the recall protocol which maps 
one into the other (as far ,s possible,. The role of the 

13 
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human will be to determine what additional knowledge is 
required to complete the mapping. m ct-.^er words, s/he will 
determine what knowledge is needed to account for 
idiosyncratic distortions, as well as those which are more 

widespread and predictable, where 

'"^'-^ le seems to 

figure prominently in recall r k . o 

^ ' jn be entered 

into the system. 

Using a computer n,odel to help in scoring recall 
protocols is a good test of the model and may provide new 
insight into the analysis of recalls. A sophisticated 
scoring procedure must operate on a context larger than just 
isolated propositions of the text. For example, let us 
consider the simple proposition (in a text) 

"Jane was watering the flowers." 
which a subject recalls es: 

"A little girl was watering her flowers". 
If our scoring algorithm focussed exclusively on one 
proposition at a time (scoring proposition by proposition) 
then the first noun phrase might be scored as an 
over-generalization (Freder iksen , 1975). However, suppose 
somewhere later in the test there is the sentence: 

"Her mother called to her to come in and pick up her 

dolls. " 

Then this later proposition interacts with the first (via an 

inference rule) yielding a highly plausible inference that 

Jane is, in fact, a little gid. it is precisely these 

11 
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interactions that o.r process „odel can he, 
concept.az ..,.,,,,,3 -°"n, 



A "Siniple.. Exampl, 



In this section „e win ^^^^ , 
'°-ssin, o„ ^^^^^ .7 ^" ^--^^^ e.a.ple, 

test ,.estion: %h ; """^ - 
ons. The purpose of this and t-h. 
section is to in, . "^^t 

to Illustrate the processes we expect 
to be able to pv,.i • expect our model 

to explicate and, in particular . 

-n-tri.ial nature of the ""^^''^ '""^ 

Of the reasoning necesq;,r.r - 

-en fai.i. Simple sto.ies. .he pi ^^""^ '° ™^"-ana 

Chosen is rep.esentatl , 

presentative of the sort of teof 

be able to handle- an '° 

an apparently simple ..3to„„ . 
related multiole ^ 

""iPle Choice questions. This ex,.„, ■ 
f-^the Educational Testin, se ■ " '"^"^^^ ^= 'aRen 

English Test of Cooperative 

of reading comprehension i„ 

— -estor. I3 followed hy f . ^ 

~ns fo be answered on the jsi J;^^-^--"- 
^ill first discuss . passage. We 

.... • : - 
....... 

represented. information might be 

The inference mechanisms used in .n 

m answering the test 
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auestlons are central to the issue, wa hava just discussed: 
understanding these processes will help to provide 
techniques for the measurement of text difficulty and a 
method for specifying what each ouestion is actually 
testing. m addition, we may expect some of the Inferences 
Pinpointed by the test questions to show up m recalls of 
the story, so our ^odel will have to understand their 
c3er ivation • 

One major distinction we wUl see in the discussion of 
Inferences below is that between linguistically-based and 
real-world-based (extra-linguistic, knowledge and Inference. 
The former is language-specific knowledge which enables the 
reader to go from the printed words to his/her^ 
extra-linguistic knowledge. The latter is knowledge which 
the reader has primarily developed through experience, such 
as "when people yell, they are often angry." 

Another point worth noting on a general level is the 
temporal nature of the comprehension process. Although the 
discussions Of answering questions below do not explicitly 
deal with intermediate stages of reading the story, the 
order of sentences in a story obviously has an effect. For 
example, the reader needs to construct many partial 
hypotheses in the course of reading which cannot be 
completely specified until more of the story is read. Part 
of a reader's strategy may be to ™ark certain inferences as 
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"important to make as soon as enough information is 
present." A story which starts out, "Her father was a 
tyrant" should set up an expectation for the reader of 
resolving the reference to her. such sequence-sensitive 
issues are noted in several places in the discussion below. 

The story we will use as the basis of our discussion is 
the following: 



"Alice!" called d voice. 

The effect on the reader and her 
listener, both of whom were sitting on the 
floor was instantaneous. Each started 
and sat rigidly intent for a moment; then, 
as the sound of approaching footsteps was 
heard, one girl hastily slipped a little 
volume under the coverlet of the bed, 
wniie the other sprang to her feet and in 
a nurried, flustered way pretended to be 
getting something out of a tall wardrobe. 

^^n.. ^-fo^e th- one who hid ±he book had 

rin^ Z"-^^.' °^ ^---^y entered the 

room and, after- glance, crisd "Alice' 

^ "^'to sit on 

the f 1 oor ? " 

r,-c,-n^^^''^ , Mommy," said Alice, 

rising meekly, meantime casting a quick 
glance at the bed to see how far its 
smoothness had been disturbed. 

"And still you continue such 
unbecoming behavior." 

"Oh, Mommy, but it is so nice!" cried 

nnor^'h^ ' ^^^^ to sit on the 

floor when you v/ere fifteen?" 



17 

14 



BBN Report No . .3427 



Bolt Beranek and Newman Inc. 



The first question is: 

1. Alice's companion was 
A a girl 
B her brother 
C the family dog 
D a doll 

The information necessary to answer this question is 
essentially contained in the fragment ... "one girl hastily 
slipped a little volume under the coverlet of the bed, while 
the other sprang to her feet..." Using basically linguistic 
knowledge about gender and t\ implications of "the other", 
we can infer that two girls are involved in the action. 
However, we only discover that one of them is, indeed, Alice 
when the "woman of . fifty" reprimands her by name and that 
discovery is continge - o. understanding direct address, 
another linguistic : -.^e , Note that the very first 
sentence of the story s^.^ the expectation that someone 
in the story is n.^ ziice and that part of the 
comprehension process w: _^ ^olve discovering who it is. 

The second questic :^ ~^:re complex in its involvement 
of real-world knowledge: 

probably^tls:*"'" approaching footsteps, she 

E angry 
F alarmed 
G puzzled 
H amused 

several pieces of evidence go into the inference that 
Alice was most probably .,,::-...d . At one level, we may look 
at various words used to -^^ibe Alice, that she "started 

18 
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ana sat ri,iaiv intenf ee.talnlv su,,ests ala.™. But this 
- not s.„i=ie„t in UseU, ana co^p.ehansion .eaui.e. 
-ttin, up 3 hypothesis ^esl.natin, this .esc.iption ana 
suppo.tin. eviaence. This hypothesis .i,ht .e co„,i„.ea o. 
"^"tea sentences in the sto.y. „io3-3 later 

7^ support to the ^ 

^ 11 - have so^e structure ^ Relates the two. .he 
^eal reason that «e .eiieve Mice is aiar^ea is tha^Te .„o„ 

the book hidden under the covprc „ 

covers. Many parts of the story 
contribute to the "a.H^^.. 

guxlt hypothesis: besides the 

aWe-.entioned p.rases, the fact that one ,irl ,i. book 
wHxle t.e other ,i., p.^tended to be occupied wUh the 
wardrobe, is a link to the reader's non-lin,uistic knowledge 

of such situations. it ic= ^ho • 

It xs the cumulative effect of such 
details that supports the "Aiin. 

tne Alice ^was feeling guilty- 
hypothesis. 

The third question is: 

3. We may infer that Alice is- 

A. stupid and resentful' 

B. very much in love 

C. fifteen years of age 

D. a spoiled child. 

-"e Phrasing Of this question alerts us to the fact 
t at inference will important. x„ fact, aeciain, that 

we aeciae She is fifteen because „e .now of a strategy: "if 
y°-e bein, hla.ea for so„ethin„ attempt to elicit the 
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sympathy Of the blading authority by .ettln, the„ to aa.lt 
they've done the sa.e thing." order to infer that this 
strategy is being applied here, „e ™ust first realise that 
Alxce is being blamed for sitting on the floor, a conclusion 
Which follows fairly- directly fro™ the mother's first 
question and Alice's .ee. response. Then „e ™ust note that 
- spea.ing to her .other, „ice has added a piece of 
information to the description of her action which (under 
this hypothetical persuasion strategy, indicates she is 
herself fifteen. It is worthwhile noting that al.ost all of 
these conclusions are based on the reader's understanding of 
the implications of social actions and speech acts. Por 
example, although Alice's final re.arR is syntactically a 
question, its real purpose is to persuade , not to gain 
information. Neither is Alice's mother's "How often have I 
told you not to sit on the floor." really a question. The 
inference of guilt is based on our knowledge of the social 
conventions surrounding the speech acts as well as our 
knowledge of mother/child relationships. 

Given that we understand, at least sketchlly, how we 
"i9ht conclude that Alice is fifteen, we are still faced 
with an important problem in understanding how we can answer 
this question. The problem is one of control structure: how 
do we Choose t .s particular reasoning path out of all the 
possible one. to follow. In this case, reasoning backward 

from the Ques- i nio-ir-iT, 

^ — IS clearly imnortant. Good test-takers 

20 
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read =„e. the possible answers to nultlple-cholce auestlons 
ana use them tc guide their detailed thinking. i„ this 
case, in considering answer C the reader's attention can be 
directed to the final paragraph where there is a reference 
to age, and reasoning continues fro» there. To understand 
the distinction between inferences .ade while reading the 
story and those »,ade in response to questions, con . 
one^ight describe Mice iast after reading the story 
compared , Un a description given after answering the 
questions. Mention of Alice's age would be .uch «re co^on 
xn the second description; although the information 
necessary to infer her age is present In the story itself, 
the actual Inference is probably not .ade (or not 
remembered) unless explicitly asked for. 

There is n,ore evidence of question-directed Inference 
in the fourth question: 

lll.tl'VllrllT, -i^-tly 
^ reading aloud 
G lying in bed 
H making her bed 

we know fairly airectly that a "reading aloud" Is 
taking place fro. the phrase "the reader ana her listener." 
(This is not really a trivial inference and working it out 
in detail .ight .ake a good fi.st goal for a 
representation., By fallowing the chain of references 
through the next several sentences, we can infer that it was 
Alice Who hid the book. However, we have no reason to 

21 
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believe that Alice was reading rather than listening; the 
fact that she hid the book is suggestive, but not 
confirming. A "process of elimination" strategy is 
necessary to answer the question. In this case, the ^ ^'her 
three possible answers are easy to rul out and we coi^ '\^rlr^ 
that it was Alice. One impl icat ion oi this example is that 
a child may do better on a reading test because s/he uses 
certain strategies which might be termed test-staking skills. 
These strategias are examples of reading with a goal, and 
they must be considered part of the knowledge necessary to 
perform well on such reading tests. The existence of such 
question-based inference strategies also points out a 
weakness in determining the difficulty of a text in vacuo, 
i-e*, outside of a task definition. It is easier to check 
whether or not a given fact is consistent with a story than 
it is to answer a more general question. 

Finally, the fifth question: 

5. Alice was worried about the appearance of the bed 
because 

A she had neglected to make it up 

B her companion had been sitting on it 

C her companion was hiding under it 

D she was afraid her mother might find the book 

Answering this question is closely related to answering 

questions 2 and 3; it requires a global understanding of the 

story and the interaction between Alice and her mother. 

3ven understanding that Alice was worried about thf bed's 

appearance requires being a" le to interpret the st :ry in 
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terms of ,uat, wrong-doln, and ,„ 

y and anger. Th lo . 
^nfer that Alice hid t^ k ''^^ ^^^^ to 

the book under the co- . . 
'"o^e to comprehend! . -^^"^1 

m ^i. *^ ion ship Of f 
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''^'^SiSiC^JBC^^ "e\".„n^?„%,-r-^ ^er mother 
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Conclusion: AUoe was afraid h 
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answered without a 
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^U3t as essentia: is kno / '""""^ "oti.atlon. 

sentences d e, f.-^, , 

^■i-ences, and the i^r,T • i^o^-Us of 

^ne implication^ of h,- 

— s- A preliminary exploration of 

= -.ow ii,.3trate3 J "---tation m 

"ecessary inferences and our ""^^-ity of the . 

handling them. Preliminary approach to 
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P. A " Garden Path" Analysis 

TO Illustrate the use of and need for a detailed 
process .odel of text comprehension we will „o„ examine an 
example of a subject "comprehending" the Alice story. An 
adult was read the story, asked the questions, and then 
asked to summarize the episode. The example shows how a 
single overlooked fact leads to catastrophe in terms o£ the 
answers to ^he multiple choice questions. This observation 
alone is surprising, but it also nicely illustrates the far 
reaching conseguences that a single piece of data can have 
in a hypothesis-driven scheme of reading. 

The subject answered two out of the five guestions 
"correctly., for a "comprehension" score of 4e,. Examining 
the hypotheses this subject reported in her summary, „e 

had carefully and properly articulated a 
"garden path" hypothesis (that is, one which is plausible 
except for some easily-overlooked piece of refuting 
evidence . ) 

There was only one linguistically-based mistake: she 
failed to connect ..one girl „hne the other ... with 

the idea of two girls. Therefore in her recall, Alice both 
hid the book and went to the wardrobe. Like most readers 
the subject fitt Obliged to account for why the book was 
secret, she assumed that it had to be a diary. The 
seguencing .f hypotheses along the way to comprehension can 
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sometimes drastically alter the final understanding of the 
text. This subject paid more attention to Alice and her 
""^^^^^^ understanding why she hid the book than do n>ost 

readers; usually readers think the mother would consider 
reading the book to be sufficient cause for blame. Also, 
She reported getting the idea that the voice belonged to 
Alice's mother because it called (and didn't yell or cry,) 
and on reading tests "Mothers always call, children always 
yell... Most subjects would have to wait until Alice 
adc3resses her as "Mommy." 

Then came the first question. One of the answers has 
to be right, and who would vou read your secret diary to? A 
doll is safest. Little girls do read to their dolls, and a 
fantasy world is the safest place for secrets. Since the 
subject didn't identify "the reader and her listener" with 
"one girl while the other", the usual path to answering 

this question was blocked. Therefore she was obliged to 
reiy on a longer chain of more tenuous quest ion- time 
inferences. 

The second question was answered conventionally; as 
detailed in the last section, Alice hurried to hide the 
booK/ so she must have been alarmed. 

The third question, beginning ^"We may infer that~," 
suggested to the subject that further inferences were called 
for. Having already concluded that Alice was fifteen years 
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old, she regarded that conclusion as explicitly stated, not 
inferred. Here again, the supposition that Alice was 
reading her secret diary figures prominently in the audit 
trail of steps to the conclusion. Alice could most 
plausibly be "very much in love" because that would be 
recorded in her diary, and a girl of fifteen would 
especially not want her mother to know that. 

The fourth question was answered reasonably given the 
episodic structure set up to answer the first question. 
This structure says that when her name was called, Alice was 
reading to "her listener," the doll. The subject chose to 
describe it as "reading to herself" rather than "reading 
aloud" because the doll was only being read to in Alice's 
imagination. "Alice was evidently reading to herself." 

The fifth question, like the second, tests the reader's 
understanding of Alice's fear of discovery. The subject 
displayed no misunderstanding here. 

So a deeper analysis of reading done by the subject 
revealed much better reading skills than were measured by 
the five questions. Just one omission crept in when she 
missed "one girl while the other," possibly because the 

clause in the ellipsis requires so much processing, possibly 
because, as she later said, the phrase "the reader and her 
listener" implied to her that one was capable of talking, 
while thp • other was not. The rest of her "troubles" were 
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all the result of a behavior that actually is part of 
skilled comprehension, the amalgamation of explicit and 
implicit information in the narrative. 

The multiple-^choice design of the test also contributes 
unnecessarily to the confusion since one of the four 
sentence completions must be correct, and that sentence is 
bound to have presuppositions which will get integrated into 
the reader's overall story interpretation. 

Thus, a "wrong" answer for question 1 strengthened the 
diary hypothesis, which was therefore trusted again in 
question 3. Her answer to Question 4 was based on her 
answer to question 1. Indeed, from the subject's point of 
view all of the questions were based on understanding 
Alice's diary: its audience, its import, its content, and 
its secrecy. Yet, far from failing to understand the story, 
the subject demonstrated great skill (if perhaps a little 
haste) in jumping to conclusions. She "deserved" to have 
missed only the first question which tested whether the 
reference to the two girls had been established. 

We believe that only by carefully representing the 
linguistically- and conceptually-based knowledge used in 
reading to the depth described can we faithfully perceive 
what skills are involved in reading, where they are absent, 
and even eventually how and in what order they may be 
taught. This is a detailed scientific undertaking which 
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requires the use of a computer to iparshal all the relevant 
information at once. It is one thing to build a speculative 
blackboard model of the information used in comprehending a 
single story; it is quite another to design a process with 
the clarity of attention to find its way through the space 
of possible reasoning steps to an actual scenario of text 
comprehension. As we saw with the above example, it is not 
the end result, but how you get there that counts. 

G. Exploring Representation Issues 

The development of an improved language for describing 
comprehension requires major inputs from a variety of 
sources. Some of the effort must be directed towards 
gathering and analysing previous work on representations of 
knowledge, as in Bruce (1975b). Some must go into informal 
recall and question-answering experiments of the kind 
discussed in the previous section, followed later by more 
rigorous tests. Much of the work is purely of the "pencil 
and paper" variety, wherein notions of representation, 
control structure and so on, are examined for adequacy and 
consistency. This type of work is exemplified in Rubin 
(1975), Bruce (1972) , " Nash-Webber and Bruce (19 76) , and 
Bobrow and Brown (1976). Finally, much of our work will be 
done in the context of computer modeling. Later in this 
section we illustrate the general form of our techniques by 
means of a tentative (and limited) analysis of one line of 
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the Alice story. 

HOW c.„ „e Characterize the diverse knowledge needed 
for reading so that it can be used by a computer program. 
HOW do we .ake the knowledge explicit so that the resulting 
-del tells us something about reading comprehension? Can 
the knowledge representation structures be .ade flexible 
enough to accommodate varying theories about reading so that 
they can be compared? Answering these and related questions 
will be a major focus for our work. 

Previous and ongoing work at BBN which deals with 
various areas connected with language provides us with a 
powerful set of technical tools. This work includes 
reliable and established software for handling semantic 
networks (used extensively in the SCHOLAR system (Carbonell 
and Collins, 1974; Collins, et al. 1975, and the SOPHIE 

system (Brown and Burton 1<)7';ii =.„j c ^ ■ 

ton, 1975)) and for building augmented 

transition network parsers, as well as technigues for using 
and building procedural representations. m addition the 
BBN speech understanding project (see Nash-„ebber and Bruce 
1976) has some 50 person-years of experience in dealing „ith 
interacting processes. Tools and experience of this kind 
mean that the design and implementation of our model will 
not reouire us to start from the very beginning. 

in order to show in a more concrete, albeit simoUfied 
manner, what such a model might look like, we will use the 
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notion o£ boxes which contain information and Eoint to other 
boxes. in fact many of these boxes can be regarded as 
schemata, but they also represent high level control 
processes, temporary storage locations, etc. 

we need to represent in boxes all of the orthographic. 
Phonemic, syntactic, semantic and pragmatic information 
Which might be retained and u.ed by a reader of a text. «e 
also need to represent a substantial amount of knowledge not 
given by the text, e.g., schemata about people, places, and 
things, knowledge of speech acts and social actions, 
knowledge about the context and purpose of the reading task, 
and so on. Given this knowledge representation we can then 
attempt to analyse the text, the answering of questions on 
the text, and recall protocols of the text. 

For example, consider the first word in the Alice 
Story: 

"Alice! " 

A possible box representation for this word (actually, a 
IHnifestation of the word, ..Mice", which is itself distinct 
from the concept, <Alice» is shown in Figure 1. Note that 
this box becomes meaningful only when we show the boxes it 

points to. For example, Wordl is ^ „,..-i. . • 

f-L_, noLox xs a manifestation of 

"Alice", as shown in Figure 2. The positional significance 
of "Alice" is indicated by the FirstWordOf pointer. One 
indication that such information is retained comes fro. 
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informal recall 
every case the 
word . 

Now, seeing ". 
reader is likely 
(first) name is "AlL-rs 
called to, shouted 
reading more. Still 
Figure 3. Finally, r 
syntax of "Alice! " anc f 
to the words making up n- 



nave dor— on the Alice story. in 
f 1 ember ed! correctly as the first 

the beginning of a text, the 
rr=E : that there is a person, whose 
nether this person is being 
just named is not clear without 
build the structure shown in 
=der produces structures for the 
-re utterance itself (as opposed 
e i-r.-erance) . 



It should be Clear at this point that for a single 
manifestation of a word there is a lot of information to 
organize and remember. One thing that helps is. that these 
boxes are highly interconnected, forming a network-like 
structures as shown in Figure 4. The box labeled "Treel" is 
simply the top box for a whole set of boxes representing 
pertinent syntactic information (e.g. parse trees). 

A complete representation of even the first sentence of 
the Alice story would not be appropriate here. Instead let 
us assume that the details at the orthographic, phonemic and 
syntactic levels are given and focus on the conceptual 
representation, re^embe^n- . however, that the interactions 
across levels may be crucial to comprehension. For the 
first sentence we might get the conceptual representation 
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shown in Figur-. 5. (Reverse ters re omitted fo:' the 

sake of cl£ri-ty) . This represent i:.. on it rasnically a schema 
in which an action is assumed -zo have ' nrious slots which 
need to be filled, the noti ns of "Mtranr", "Speak", and 
"Conscious Prcressor" being taker from Schank, 1975. Note 
that this represenration allows different inxerpre tat ions of 
the first sentence. For instance^ the voice could be 
calling to Alice, or merely invoking her name (as in anger 
at a discovered wrong) . 

In addition to representations of the text structure 
(including such immediate inferences as "a voice that can 
utter 'Alice' probably belongs to a person") there must be 
representations of relevant world knowledge. For example^ 
the speech act of calling to someone has a number of 
presuppositions and expectations associated with it which 
can be used in later structuring of the text. This and 
similar kinds of knowledge must be readily available for 
comprehension to occur. 

The preceding examples are admittedly sketchy and are 
intended to show only somf^ of the fn^^ctor s we want to 
consider in our knowledge representations. Our research 
will be guided by t±ie demarrr^B of ar.'rrajal children's texts and 
Questions such a^s: i) Does the rrrrrYTr^n deiH^nstrate ho^: an 
inference could be made? (2) Csn a class of infenence 
failures be described in terms of general features of the 

— a4- - - - 
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model? (3) Can general features of the model be translated 
into :=rescriptions for test and training material design, 
research procedures, and implications for teaching? 

^' Implementation Issues 

The first version of our computer model will be used to 
explore representation and inference control issues which 
would have a major impact on later versions. We will work 
with texts selected to share a common body of world 
knowledge. Thus we will be able to concentrate on general 
representation issues^ rather than the specifics of several 
unrelated texts. 

The programs will be written in INTERLISP so ;:t3t we 
can quickly incorporat- pa. .s of existing programs e.g., 
the BBN speech unaer 3 tend ing s : am) which prove useful. a 
major example in this categ - is SEMNET, a program vhich 
makes it easy to build, chance, search through and prir. out 
a semantic network. 

At first we will use rorma, representations of the text 
rather than the naw Englisl.,. Although both parsing and 
generation programs are available to us, and could be used 
at a later date, we feel that the main focus of t-his 
programming work o.qht to be ^^n comar ehens ion problems an- 
not on input/outout c.stx .ns. Cr. the other hand th. 
formal representa:.icT. .s.^ ip.ust all.Tw for expressic- of 
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surface syntactic or orthographic information which might 
interact with comprehension processes. 

A sketch of what steps the program should follow do in 
specifying the difficulty of a test question and what 
capabilities it is testing is as follows: First a formal 
representation of a text is r-ad in. Then a structure is 
built in which some inferences; have been made to give 
coherence to the text. Next t-e program is asked to answer 
a question. m the process of answering the question, the 
program maintains an audit trail which shows just whic'n 
inferences of each kind were usee. This audi:, -rail gi^es a 
measure of the difficulty of the question w^th respect to 
the text for a given body of world knowledge ir:±=rence 
strategies. Changes in the criiesrion, tne text or the 
stored knowledge can alter the it tjr2,il s:ignifrc=-xl 

and thus show in a precise way ti ; exfecxs -f text anr ta^sk 
characteristics . 

A generalization of the guestJDn answering problem is 
that of text comparison. Given a text and a rec3lied 
version of it, the program will apply tine same infsren.ce 
rules and knowledge in an att-mp to convert tne text in:ro 
its recalled version. Again. the sudi- trail giv--: - 
precise objective measure of the cijr£i2iulty of rhe 
transformation task, and thus, in this ca^, cf the disrrance 
between the two versions. 
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I. Conclusion 



Some caution should be exercised in interpreting what 
we have proposed. Our programming efforts will be directed 
towards implementing a restricted model which represents 
selected crucial components. We will incorporate only the 
knowledge required to handle a few simple texts (e.g., 
sample test items). For the model to be of more general use 
would require the incorporation of an enormous amount of 
world knowledge which is not a realistic undertaking in the 
forseeabl=e future. However, once a limited-knowledge 
version is implemented and working there are several 
possibilities that could be pursued. For any particular use 
it could b~ "primed" with appropriate knowledge as, for 
example, when one might wish to use it to assist in 
providinc objective scores on recall protocols. It could 
also be used to handle different texts in the same domain. 

Understanding the reading process involves having 
precise conceptions about the way in which various knowledge 
sources and critical processes interact. Reading 
comprehension is a dynamic process; understanding it 
requires models with dynamic characteristics. The computer 
is the best way we know of to represent such 
characteristics, and programs of the kind we propose 
represent the best way we know of to precisely specify their 
interactions . 
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